Energy Functional and Fixed Points of a Neural Networks 

Leonid B. Litinskii 

High Pressure Physics Institute of Russian Academy of Sciences 

Russia, 142092 Troitsk Moscow region, e-mail: litin(Shppi. troitsk.ru 

It turned out that the set of the fixed points is not necessarily the same as the set of the local 
O^ . minima of the energy functional. It depends on the dioganal elements of the connection matrix. 

^\ • The simple method which allows to cut off fictitious fixed points with high energies was found out. 

0^ i Especially the method is effective if the connection matrix is the projection one. 

T-H ' 

dJ , I. INTRODUCTION 

Q> ■ We define a neural network as a dynamic system of n spin variables (spins) which can take one of two values: 
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X 



a, = {±1}, i = l,2,...,n. (1) 



I ' The spins are connected by a symmetric connection matrix J — (Jij) 



Jij — Jji, I, ] — I, z, . . . , n. 
"c^ i The local potential 



a ■ 

o ■ . . 

^ ' with which the network acts on spin i, determines solely the value of spin i at time i + 1: 

►^ ^(+^^\-} '^^W' ^^ K{t)a,{t)>Q ,, 



(^ , ^ -Or{t), if h,{t)a,{t) < 

The state of the network as a whole is described by a conHguration vector a, whose coordinates are given by Eq.(l). 
p\j . In what follows, Greek letters will be used to designate configuration vectors. 

C3 ' We want to investigate the set of the so called fixed points of the network, i.e. such states a* , that for all coordinates 

On ! a* have: 



a*h^>0, i = l,2,...,n. (4) 

Besides the neural network theory, the mathematical model (l)-(3) is also used in the factor alalysia^ and in the 
Ising modela. The neural network theory makes use of a physical concept of the energy of the state a, which is defined 



-J n -J n 

E{a) = '^a.^hi = ^ Jijcr^a-j. (5) 

i— 1 ij — 1 

It is very important, both from the physical point of view and the ability of the network to have content-addressable 
memorya, that the energy of the state would be a decreasing function on every step of the network evolution. And, 
moreover, the fixed points must be the local minima of the energy functional (5). 

In the second section, we obtain the conditions under which a connection matrix J guarantees the fulfillment of the 
above mentioned requirements. It has been found that the Hcbb connection matrix as well as the connection matrices 
which are used in physical problems possess the necessary property. But this is not the case for the projection matrixH. 
As a result, a network with such a connection matrix has a set of fixed points which is wider than the set of the local 
minima of the energy functional (5). In the third section, we show how the situation for a network with a projection 
matrix can be improved. 



Notations. In what follows, a network with a connection matrix J is called a J-network. We denote by FP{J) 
the set of all fixed points of the J-network. A configuration vector which is a fixed point of a network will have a 
superscript " *" : 

a* e FP{J). 

We denote by LM{J) the set of the local minima of the energy functional (5). To examine the local minima of 
the functional (5), we introduce a topology on the set of configuration vectors: the set of n configuration vectors ct*-" 
which arc the nearest to the vector a in the sense of the Hamming distance will be called a vicinity of the state a: 

CT*'^ = (cri,cr2,.. .,-(71,.. .,an), I = l,2,...,n. 

In other words, the state c?'^') from the vicinity of the state a differs from the latter by the opposite value of the /th 
spin only, 

a e LM{J) ^ E{a) < E{a^^^), l^l,2,...,n. 

Finally, a matrix with zero diagonal elements will be marked by the superscript " 0" : 

j(o)^J,, ==0, i = l,2,...,n. 

II. ON THE ROLE OF THE DIAGONAL ELEMENTS 

Theorem 1 

1. The set of the local minima of the energy functional (5) does not depend on the value of the diagonal elements 
of the connection matrix: 

LM{J) = LM{J + A), 

where 

A = diag{aii,a22,...,ann} (6) 

and all the elements an are arbitrary real numbers. 

2. For a connection matrix J^°^ with zero diagonal elements, the set of the fixed points coincides with the set of 
the local minima of the energy functional 

FP{J'^"^) = LM{J'^^^). (7) 

3. Let all the elements an of the diagonal matrix (6) be positive, then 

FP(j(") + A) D FP{j'^"'^) D FP{j'^"'^ - A). (8) 

The proof of Theorem 1: Let's write the energy of the state a, extracting the contribution of the /th spin. Up to the 
positive factor we obtain: 

E{a) ex - ^ JijCTiCTj + Jii - 2aihi. (9) 

The state a will be the local minimum of the energy functional if and only if the system of the inequalities (10) is 
fulfilled for all states ct^'^ from the vicinity of the state a: 

E{a^^^)-E{a)<xaihi-Jii^ai'^Ji:ja:j>Q, ; = l,2,...,n. (10) 

It is evident that the inequalities (10) do not depend on the values of the diagonal elements. The conditions of their 
fulfillment are defined by the off-diagonal part of the matrix J . By this the first item of the Theorem is proved. 
Moreover, it follows from the inequalities (10) that for nonnegative Ju any local minimum of the energy functional is 
also a fixed point of the network: 

LM (J) CFP{J) when Jii>0, l^l,2,...,n. (11) 



Then, let a state a* be a fixed point of the network: a^hi > 0, Z = 1, 2, . . . , n. With the help of Eq.(9) we obtain: 

r > 0, if Ju = 
£((?(')) - E{a*) oc a*ihi -Jul ? 0, if Ju > 

[ > 0, if Ju < 

In other words, for nonpositive Ju any fixed point of the network is a local minimum of the energy functional: 

LM{J)DFP{J) when Jii <0, l = l,2,...,n. (12) 

Combined with the proved first item of the Theorem, Eqs. (11) and (12) justify the correctness of Eqs. (7) and (8). 
Thus, the proof of the Theorem is finished. 

In fact, to some extent the Theorem 1 permits regulating the set of the fixed points of the network. Let's explain 
this statement. If the matrix J^^^^ is transformed by the diagonal matrix A, J{A) — J*^"^ + A, then in accordance 
with the Theorem 1, the set of the local minima of the energy functional is not changed. But this transformation 
affects the set of the fixed points of the J(A)-network. Indeed, if all the matrix elements an are positive, the set of 
the J(A)-network fixed points extends as compared with FP^J^^^^). The last is true due to the appearance of the new 
fixed points which are not the local minima. If, on the contrary, all the matrix elements an are negative, the set of 
the fixed points of the J(A)-network narrows as compared with FP{J^^'>): some states, remaining, as they were, the 
local minima of the energy functional, cease to be the fixed points. 

The last statement allows to suggest a simple method for the elimination of the unnecessary fixed points of the 
network. Let's formulate it in the form of a theorem. 

Theorem 2 

Let the fixed points of a j'"' -network be numbered in such a way that 

^(a(i) < £;(a(2) < . . . < ^(^(fc) < £;(^(fc+i) < . . . 

(to simplify the writing, here we omit the superscript "*" in the notations of the fixed points). Let A be a diagonal 
matrix whose elements are defined by the equalities 

a^, = minCTP/i,(a(')), i = l,2,...n, (13) 

1^1, k 

where hi{a) is the potential (2) which acts on the zth spin in the J^*^^-network. Then 

FF(jW-A) = {a(i),?(2),...,?W}. 

The proof of Theorem 2: Since all an from Eq. (13) are positive, the set of the ( j'*'^ — yl)-network fixed points, due 
to Theorem 1, can be only narrower in comparison with the set J^^^ -network fixed points. And from the definition 
(4) of a fixed point, it follows that the state (?'■'•' will be a {j'^^^ — ^)-network fixed point if and only if the system of 
the inequalities 

af)/i,(aW)-a,, >0, i = l,2,...,n (14) 

is fulfilled. When the definition (13) is taken into account, it is evident that for any fixed point a*^^) with the I < k the 
system of the inequalities (14) is fulfilled. Consequently, the first k states of the i?''^ are the ( j'"-* — A)-network fixed 
points. On the other hand, by proceeding from Eq.(5) for the energy and taking into account that the j'"^-network 
fixed points are strictly ordered with respect to the energy increase, it is easy to see that at least for one of the 
coordinates of the state ct^'^ with / > k, the inequalities (14) are not fulfilled. Consequently, the states (?'■'•' with I > k 
are not the ( J*^*^-* — A)-nctwork fixed points. 

Remark. From Eq.(9) it can be easily shown that even under the sequential dynamics the evolution of a network 
with a connection matrix whose diagonal elements are negative can be accompanied by the energy increase. As a 
result, even under the sequential dynamics limit cycles can be formed for such a network! From this point of view, the 
connection matrices with negative diagonal elements are absolutely nonphysical. But Theorem 2 gives a simple and 
effective method. to eliminate high energy fixed points. In some cases this method can be very useful. In particular, it 
is well-knownETEl, that for a network with the projection connection matrix the energies of the spurious fixed points 
are larger then the energies of the memorized patterns. Consequently all such spurious fixed points can be easily 
eliminated with the help of Theorem 2. 



III. PROJECTION CONNECTION MATRIX 

1). Let 

e^'^ = (ef\d'\...,ei"), 1 = 1,2,. ..,p, (15) 

be p preassigned configuration vectors which|-jAfe would like to have as a network fixed points (such vectors are usually 
called the memorized patterns). It is knowncl, that it can be easily done if a matrix P of orthogonal projection onto 
the linear subspace A r^i) ^(2) f*(p)>: spanned by the p memorized patterns ^^", is taken as a connection matrix. The 
matrix P is symmetric and nilpotent one: P^ = P, P^ = P. Besides, by definition PS}-''^ = S}-^\ I = 1,2, ... ,p. 
Consequently, the vectors S}-^' are not only the fixed points of the P-network, but provide the global minimum of the 
energy functional (5): 

i^(l^'^) = - ^^ '^ ^ =-1, I = 1,2,..., p. 
n 

But, it is known from experience, that, as a rule, the P-network has additional fixed points which are called spurious 
fixed points. Their number is much larger that for the network with Hebb's connection matrix. And the worst is that 
not all the P-network fixed points arc the local minima of the energy functionalu. Theorem 1 helps to clarify the 
situation. 

2). For simplicity we assume that p memorized patterns ^''' are linearly independent vectors. We introduce a 
rectangular [p x n)- matrix S whose rows are memorized patterns ^" : 



.1 ?2 ■•• ?" (16) 

\d'^ d'^ ... d'^ ) 

Then the matrix of the orthogonal projection onto the subspace A^ r j.(i)|p> is 

P = YE, (17) 

where Y is the. (n x p)-matrix that is pseudoinverse of the matrix S. Apropos of the construction of pseudoinverse 
matrices, seelj'cl. We only want to mention, that the columns of the matrix Y are n-dimensional vectors y^'-' such that 
(yWj^(' )) = Sw , where Sw is the Kronecker symbol; j/^'^ are also the linearly independent vectors. 

The diagonal elements of the matrix P are positive. Indeed, they are equal to the squares of the projections onto 
the subspace A ^-(1) ^^2) Ap)> of the ri-dimensional Cartesian unit vectors e^*' = (0, . . . , 1, . . . , 0): 

Pu = iP^'\^'^)=\\P(^'^\f=dl i = l,2,...,n. (18) 

The values df must be nonzero, otherwise the vectors Pe^*' = 'J2^j=iQ y''^^ , which are the sums of the linearly 
independent vectors, would be zero. Then, according to Theorem 1, the set of the P-network fixed points will really 
be wider than the set of the local minima of the energy functional. 
If instead the P-network we consider the p(°) -network, 

p(0) ^ P-D, where D = diag{d\, dl, . . . , dl) (19) 

only the local minima of the energy functional will be the fixed points of the P'^'-network. Since the memorized 
patterns provide the global minimum of the energy functional, they will necessarily be the P^'^^-network fixed points. 
In addition, the P'^'^'-nctwork has no nonphysical fixed points, i.e. those which are not the local minima of the energy 
functional. 

Let's show that the set of the P^^^pjUetwork fixed points has a structure which resembles the structure of the set of 
the fixed points for Hopfield's modelQ. 

Theorem 3 



All n neurons can be enumerated in such a way that all the P(°) -network fixed points will be of a "piecewise 
constant" kind: 



a — (ei, ei, . . . , El, £2, £2, ■ • ■ , £25 • ■ • , Ell, £«, ■ • ■ , £u)j 



(20) 



where Si = {±1}- The number v, the composition and the dimension n,; of constant sign intervals are defined uniquely 

by the memorized patterns matrix S (16). 

The proof of Theorem 3. For simplicity we assume that the number of the neurons n is much larger than the number 

of the memorized patterns p. For definiteness, we assume that n > 2^ . Then it is evident that not all n p-dimensional 

column-vectors 



/ d^^ \ 



,(2) 



V ^l"' J 



z = l,2,. 



(21) 



of the matrix S will be different. Some of them will be repeated R. Let's assume that n column-vectors ^i break 
up into V groups. Each group consists of some identical vectors: ni identical vectors belong to the first group; n2 
identical vectors belong to the second group and so on. And ni + n2 + ■ ■ ■ + n^ — n. Without loss of generality it 
may be assumed that the neurons of the network can be numbered in such a way that the matrix S will have a form: 



f^i'' 



Ci 



(2) 



?1 '?2 

t(2) ^(2) 
?1 '?2 
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?1 S2 
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(22) 



i^'j 



iJ 



It was shown inP that when the projection matrix P acts onto the configuration vector a the result is 



Pa = ^mW(a)e^') 



1=1 



where rrS^\a) depends only on the state a. If a* is a fixed point of the P'^'^^-network, its coordinates a* must satisfy 
the system of equations: 



sign{Pa* - Da*) = sign V m«(a*)ef^ - dja* \ = a* , z - 1, 2, 



(23) 



With respect to the form of the matrix ^ (see Eq.(22) ), it is easy to understand that, for example, for the first rii 



values of the subscript i all the sums Yl\=i ™''H^*)Ci are the same: X)f=i "^^'■'(^*)Ci 



,(0 



M, 



1,2,..., Til. From 



here it immediately follows, that the first rii coordinates of the fixed point a* must be equal. Indeed, let's contrarily 
assume that, for example, a\ = 1 and (72 — —1. Then from Eqs. (23) it follows that two inequalities 



M >dl 
-M>dl 



have to be fulfilled simultaneously. But this is impossible. 

Consequently, the first ni coordinates of the P^'^^-network fixed point are equal to one another. Similarly, it can 
be proved that the next n2 coordinates of the fixed point are equal to one another too, and so on up to the equality 
among the last n^ coordinates. The Theorem is proved. 



^The p-dimentional column-vectors S,i (21) are labelled with subscripts in contrast to the n-dimensional memorized patterns 
^'^ (15), which are labelled with superscripts. 



3). Thus, the P*^°)-network possesses some attractive properties. Namely, ah the memorized patterns will necessarily 
be its fixed points. Moreover, all the fixed points are, firstly, thfi||iocal minima of the energy functional; secondly, they 
are of "piecewise constant" kind given by Eq. (20). It is knownQQ that the last two of the above mentioned properties 
are also typical for the fixed points of the Hopfield model, that is, for a network with the Hebb connection matrix. In 
our following discussions we will have in mind cither the P'^'^'-network, or the Hopfield model. 

While Eq.(20) is only the necessary condition for a configuration to be a fixed point of a network, it restricts 
sufhciently the circle of the "applicants". Let's provide some estimates. 

If V is the number of different column- vectors of the matrix S and q is the number of the fixed points of the related 
network, it is clear that q < 2"" . Next, when the number of the memorized patterns p is given, a natural estimate for 
z; is: v < 2^. But, as it was shown inQ, actually we have: 

q<2'^'"\ (24) 

Eq. (24) restricts the number of the fixed points from above. It is easy to verify that when p — 2, the inequality 
(24) transforms into the equality, and q just equals 4. But even for p = 3 the right-hand side of Eq.(24) is 16, and 
iru it was shows that for p — 3 the maximum possible number of the network fepd points is 14. This result coincides 
with the one oiQ. Other estimates for the fixed points number can be found inl3"E3 . 
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